Search CORE

70 research outputs found

Algorithm-Based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures and Accuracy

Author: Bums G.
Chen Z.
Elnozahy E. N.
Fagg G. E.
Golub G. H.
Hakkarinen D.
Katz D. S.
Kumar V.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks

Author: Ananthanarayanan G.
Ananthanarayanan G.
Baker J.
Bent J.
Dean J.
Elnozahy E.
Gunda P. K.
Guo P. J.
Nightingale E. B.
Plank J. S.
Power R.
Weil S. A.
Yu Y.
Zaharia M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

Tachyon is a distributed file system enabling reliable data sharing at memory speed across cluster computing frameworks. While caching today improves read workloads, writes are either network or disk bound, as replication is used for fault-tolerance. Tachyon eliminates this bottleneck by pushing lineage, a well-known technique, into the storage layer. The key challenge in making a long-running lineage-based storage system is timely data recovery in case of failures. Tachyon addresses this issue by introducing a checkpointing algorithm that guarantees bounded recovery cost and resource allocation strategies for recomputation under commonly used resource schedulers. Our evaluation shows that Tachyon outperforms in-memory HDFS by 110x for writes. It also improves the end-to-end latency of a realistic workflow by 4x. Tachyon is open source and is deployed at multiple companies.National Science Foundation (U.S.) (CISE Expeditions Award CCF-1139158)Lawrence Berkeley National Laboratory (Award 7076018)United States. Defense Advanced Research Projects Agency (XData Award FA8750-12-2-0331

CiteSeerX

DSpace@MIT

Crossref

Static analysis-based approaches for secure software development

Author: A Carzaniga
B Chess
E Gelenbe
E Gelenbe
E Gelenbe
E Gelenbe
E Gelenbe
E Gelenbe
EN Elnozahy
G McGraw
G McGraw
G Rodríguez
I Chowdhury
IP Egwutuoha
JP Walters
JW Young
M Green
N Losada
P Salini
R Scandariato
S Moshtari
T Boland
Y Shin
Y Shin
Publication venue: Springer-Verlag Berlin
Publication date: 01/01/2018
Field of study

Software security is a matter of major concern for software development enterprises that wish to deliver highly secure software products to their customers. Static analysis is considered one of the most effective mechanisms for adding security to software products. The multitude of static analysis tools that are available provide a large number of raw results that may contain security-relevant information, which may be useful for the production of secure software. Several mechanisms that can facilitate the production of both secure and reliable software applications have been proposed over the years. In this paper, two such mechanisms, particularly the vulnerability prediction models (VPMs) and the optimum checkpoint recommendation (OCR) mechanisms, are theoretically examined, while their potential improvement by using static analysis is also investigated. In particular, we review the most significant contributions regarding these mechanisms, identify their most important open issues, and propose directions for future research, emphasizing on the potential adoption of static analysis for addressing the identified open issues. Hence, this paper can act as a reference for researchers that wish to contribute in these subfields, in order to gain solid understanding of the existing solutions and their open issues that require further research

Crossref

Spiral - Imperial College Digital Repository

An Efficient Technique for Tracking Nondeterministic Execution and its Applications

Author: E. N. Elnozahy
Elnozahy May Cmu-Cs-
Publication venue
Publication date
Field of study

This report describes a technique for using instruction counters to track nondeterminism in the execution of operating system kernels and user programs. The operating system records the number of instructions between consecutive nondeterministic events and information about their nature during normal operation. During an analysis phase, the execution is repeated under the control of a monitor, and the nondeterministic events are applied at the same instructions as during the monitored execution. We describe the application of this technique to four areas: Performance monitoring: The technique can be used to instrument an operating system to capture long traces of memory references. Unlike current techniques, it performs the gathering in a postmortem phase and therefore has negligible effect on the computation itself during the monitoring phase. We expect trace periods that are longer than what existing techniques can capture by orders of magnitude with little or no noticeable perturba..

CiteSeerX

Storage Strategies for Fault-Tolerant Video Servers

Author: E. N. Elnozahy
Elnozahy August Cmu-Cs-
Publication venue
Publication date
Field of study

We consider the problem of providing high availability in cluster-based video servers. The cluster acts as a parallel processor that provides the aggregate I/O and network bandwidths of the component machines. In such an environment, the failure of one server may affect the availability of the video service or its quality. Existing approaches to this problem fall into two categories. On one hand there are RAID-like schemes that store error correcting code (ECC) in addition to the video data. Should a failure occur, the unavailable data can be computed on the fly using the ECC and the service continues at the same quality. In cluster-based systems, however, the video data are distributed over several servers and there is no convenient point to reconstruct the missing blocks except at the client. Relying on the client for this task is not desirable as it may not have the necessary buffering or processing capacity On the other hand, some argue that the system could just continue operation..

CiteSeerX

On the Relevance of Communication Costs of Rollback-Recovery Protocols

Author: E. N. Elnozahy
Publication venue
Publication date: 01/01/1995
Field of study

Communication overhead has been traditionally the primary metric for evaluating rollback-recovery protocols. This paper reexamines the prominence of this metric in light of the recent increases in processor and network speeds. We introduce a new recovery algorithm for a family of rollbackrecovery protocols based on logging. The new algorithm incurs a higher communication overhead during recovery than previous algorithms, but it requires less access to stable storage and imposes no restrictions on the execution of live processes. Experimental results show that the new algorithm performs better than one that is optimized for low communication overhead. These results suggest that in modern environments, latency in accessing stable storage and intrusion of a particular algorithm on the execution of live processes are more important than the number of messages exchanged during recovery

CiteSeerX

Crossref

Address trace compression through loop detection and reduction

Author: E. N. Elnozahy
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Low-Cost Garbage Collection for Causal Message Logging

Author: E. N. Elnozahy
G. R. Andrews
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref